30 research outputs found

    Bounding errors of Expectation-Propagation

    Full text link
    Expectation Propagation is a very popular algorithm for variational inference, but comes with few theoretical guarantees. In this article, we prove that the approximation errors made by EP can be bounded. Our bounds have an asymptotic interpretation in the number nn of datapoints, which allows us to study EP's convergence with respect to the true posterior. In particular, we show that EP converges at a rate of 0(n−2)\mathcal{0}(n^{-2}) for the mean, up to an order of magnitude faster than the traditional Gaussian approximation at the mode. We also give similar asymptotic expansions for moments of order 2 to 4, as well as excess Kullback-Leibler cost (defined as the additional KL cost incurred by using EP rather than the ideal Gaussian approximation). All these expansions highlight the superior convergence properties of EP. Our approach for deriving those results is likely applicable to many similar approximate inference methods. In addition, we introduce bounds on the moments of log-concave distributions that may be of independent interest.Comment: Accepted and published at NIPS 201

    The Poisson transform for unnormalised statistical models

    Full text link
    Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space Ω\Omega can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on Ω\Omega. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression

    Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

    Full text link
    ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standard ABC and (b) it makes it possible to use local summary statistics (i.e. summary statistics that depend only on the data-points that correspond to a single factor), rather than global summary statistics (that depend on the complete dataset). This greatly alleviates the bias introduced by summary statistics, and even removes it entirely in situations where local summary statistics are simply the identity function. We focus on EP (Expectation-Propagation), a convenient and powerful way to combine n local approximations into a global approximation. Compared to the EP- ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one based on the parallel EP algorithm of Cseke and Heskes (2011), which has the advantage of being implementable on a parallel architecture, and one version which bridges the gap between standard EP and parallel EP. We illustrate our approach with an expensive application of ABC, namely inference on spatial extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon

    Spectral properties of kernel matrices in the flat limit

    Full text link
    Kernel matrices are of central importance to many applied fields. In this manuscript, we focus on spectral properties of kernel matrices in the so-called "flat limit", which occurs when points are close together relative to the scale of the kernel. We establish asymptotic expressions for the determinants of the kernel matrices, which we then leverage to obtain asymptotic expressions for the main terms of the eigenvalues. Analyticity of the eigenprojectors yields expressions for limiting eigenvectors, which are strongly tied to discrete orthogonal polynomials. Both smooth and finitely smooth kernels are covered, with stronger results available in the finite smoothness case.Comment: 40 pages, 8 page

    Estimating the inverse trace using random forests on graphs

    Get PDF
    Some data analysis problems require the computation of (regularised) inverse traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large matrices, direct methods are unfeasible and one must resort to approximations, for example using a conjugate gradient solver combined with Girard's trace estimator (also known as Hutchinson's trace estimator). Here we describe an unbiased estimator of the regularized inverse trace, based on Wilson's algorithm, an algorithm that was initially designed to draw uniform spanning trees in graphs. Our method is fast, easy to implement, and scales to very large matrices. Its main drawback is that it is limited to diagonally dominant matrices \bL.Comment: Submitted to GRETSI conferenc

    Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes

    Full text link
    Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diversity. DPPs come in two variants: fixed-size and varying-size. A sample from a varying-size DPP is a subset of random cardinality, while in fixed-size "kk-DPPs" the cardinality is fixed. The latter makes more sense in many applications, but unfortunately their computational properties are less attractive, since, among other things, inclusion probabilities are harder to compute. In this work we show that as the size of the ground set grows, kk-DPPs and DPPs become equivalent, meaning that their inclusion probabilities converge. As a by-product, we obtain saddlepoint formulas for inclusion probabilities in kk-DPPs. These turn out to be extremely accurate, and suffer less from numerical difficulties than exact methods do. Our results also suggest that kk-DPPs and DPPs also have equivalent maximum likelihood estimators. Finally, we obtain results on asymptotic approximations of elementary symmetric polynomials which may be of independent interest

    Modelling fixation locations using spatial point processes

    Full text link
    Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate, and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space: this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.Comment: Revised following peer revie
    corecore